Hi
Stata has a standalone command "pca" for doing principle components analysis. But it also has an option "pcf" for factor. I am one of the many people who are confused about what the difference is between these two commands, since a) some people claim that they are doing different things and b) they clearly produce (somewhat) different results.
This question has been asked multiple times here over the years and the answers here
https://www.statalist.org/forums/for...using-pcf-vs-p
and here
https://www.stata.com/statalist/arch.../msg00321.html)
seem to imply that
1) PCA is doing "real" PCA while factor, pcf is doing "factor analysis using principal component analysis for factor extraction" which are actually different things
2) In SPSS the only way to actually do "PCA " at all is to do "factor analysis using principal component analysis for factor extraction" - and this used to be true of Stata as well until the development of the PCA command.
But this still leaves me with some (related) questions
1) How exactly are PCA and "factor analysis using principal component analysis for factor extraction" different and why do they give such different answers for the loadings? I haven't seen any stats textbooks that make this distinction, only Stata.
2) Given that other programs (like SPSS) don't seem to make a distinction between these two approaches, how much does it really matter?
3) Are the different loadings I get for these two commands actually providing the same exact information in different form? If so, how are they related? If not, on what basis should I decide which to use?
Anyone have any insight on these questions?
If you are curious the differing results can be seen with the example data for "factor"
. webuse bg2
(Physician-cost data)
. factor bg2cost1-bg2cost6, pcf
(obs=568)
Factor analysis/correlation Number of obs = 568
Method: principal-component factors Retained factors = 2
Rotation: (unrotated) Number of params = 11
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.70622 0.30334 0.2844 0.2844
Factor2 | 1.40288 0.49422 0.2338 0.5182
Factor3 | 0.90865 0.18567 0.1514 0.6696
Factor4 | 0.72298 0.05606 0.1205 0.7901
Factor5 | 0.66692 0.07456 0.1112 0.9013
Factor6 | 0.59236 . 0.0987 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000
Factor loadings (pattern matrix) and unique variances
-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
bg2cost1 | 0.3581 0.6279 | 0.4775
bg2cost2 | -0.4850 0.5244 | 0.4898
bg2cost3 | -0.5326 0.5725 | 0.3886
bg2cost4 | -0.4919 0.3254 | 0.6521
bg2cost5 | 0.6238 0.3962 | 0.4539
bg2cost6 | 0.6543 0.3780 | 0.4290
-------------------------------------------------
. pca bg2cost1-bg2cost6
Principal components/correlation Number of obs = 568
Number of comp. = 6
Trace = 6
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.70622 .303339 0.2844 0.2844
Comp2 | 1.40288 .494225 0.2338 0.5182
Comp3 | .908652 .185673 0.1514 0.6696
Comp4 | .722979 .0560588 0.1205 0.7901
Comp5 | .66692 .074563 0.1112 0.9013
Comp6 | .592357 . 0.0987 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
----------------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 | Unexplained
-------------+------------------------------------------------------------+-------------
bg2cost1 | 0.2741 0.5302 -0.2712 -0.7468 -0.0104 -0.1111 | 0
bg2cost2 | -0.3713 0.4428 -0.4974 0.2800 0.2996 0.5005 | 0
bg2cost3 | -0.4077 0.4834 0.0656 0.2466 -0.5649 -0.4646 | 0
bg2cost4 | -0.3766 0.2748 0.7266 -0.2213 0.4504 0.0538 | 0
bg2cost5 | 0.4776 0.3345 0.3829 0.1950 -0.3942 0.5657 | 0
bg2cost6 | 0.5009 0.3192 0.0144 0.4647 0.4824 -0.4453 | 0
----------------------------------------------------------------------------------------
.
You can see that PCA and factor PCF give identical eigenvalues, but very different loadings for the first two factors/components. Of course, the substantive "story" of the factors/components (in terms of positive/negative loadings)seem similar in both analyses.
Stata has a standalone command "pca" for doing principle components analysis. But it also has an option "pcf" for factor. I am one of the many people who are confused about what the difference is between these two commands, since a) some people claim that they are doing different things and b) they clearly produce (somewhat) different results.
This question has been asked multiple times here over the years and the answers here
https://www.statalist.org/forums/for...using-pcf-vs-p
and here
https://www.stata.com/statalist/arch.../msg00321.html)
seem to imply that
1) PCA is doing "real" PCA while factor, pcf is doing "factor analysis using principal component analysis for factor extraction" which are actually different things
2) In SPSS the only way to actually do "PCA " at all is to do "factor analysis using principal component analysis for factor extraction" - and this used to be true of Stata as well until the development of the PCA command.
But this still leaves me with some (related) questions
1) How exactly are PCA and "factor analysis using principal component analysis for factor extraction" different and why do they give such different answers for the loadings? I haven't seen any stats textbooks that make this distinction, only Stata.
2) Given that other programs (like SPSS) don't seem to make a distinction between these two approaches, how much does it really matter?
3) Are the different loadings I get for these two commands actually providing the same exact information in different form? If so, how are they related? If not, on what basis should I decide which to use?
Anyone have any insight on these questions?
If you are curious the differing results can be seen with the example data for "factor"
. webuse bg2
(Physician-cost data)
. factor bg2cost1-bg2cost6, pcf
(obs=568)
Factor analysis/correlation Number of obs = 568
Method: principal-component factors Retained factors = 2
Rotation: (unrotated) Number of params = 11
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.70622 0.30334 0.2844 0.2844
Factor2 | 1.40288 0.49422 0.2338 0.5182
Factor3 | 0.90865 0.18567 0.1514 0.6696
Factor4 | 0.72298 0.05606 0.1205 0.7901
Factor5 | 0.66692 0.07456 0.1112 0.9013
Factor6 | 0.59236 . 0.0987 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000
Factor loadings (pattern matrix) and unique variances
-------------------------------------------------
Variable | Factor1 Factor2 | Uniqueness
-------------+--------------------+--------------
bg2cost1 | 0.3581 0.6279 | 0.4775
bg2cost2 | -0.4850 0.5244 | 0.4898
bg2cost3 | -0.5326 0.5725 | 0.3886
bg2cost4 | -0.4919 0.3254 | 0.6521
bg2cost5 | 0.6238 0.3962 | 0.4539
bg2cost6 | 0.6543 0.3780 | 0.4290
-------------------------------------------------
. pca bg2cost1-bg2cost6
Principal components/correlation Number of obs = 568
Number of comp. = 6
Trace = 6
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.70622 .303339 0.2844 0.2844
Comp2 | 1.40288 .494225 0.2338 0.5182
Comp3 | .908652 .185673 0.1514 0.6696
Comp4 | .722979 .0560588 0.1205 0.7901
Comp5 | .66692 .074563 0.1112 0.9013
Comp6 | .592357 . 0.0987 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
----------------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 | Unexplained
-------------+------------------------------------------------------------+-------------
bg2cost1 | 0.2741 0.5302 -0.2712 -0.7468 -0.0104 -0.1111 | 0
bg2cost2 | -0.3713 0.4428 -0.4974 0.2800 0.2996 0.5005 | 0
bg2cost3 | -0.4077 0.4834 0.0656 0.2466 -0.5649 -0.4646 | 0
bg2cost4 | -0.3766 0.2748 0.7266 -0.2213 0.4504 0.0538 | 0
bg2cost5 | 0.4776 0.3345 0.3829 0.1950 -0.3942 0.5657 | 0
bg2cost6 | 0.5009 0.3192 0.0144 0.4647 0.4824 -0.4453 | 0
----------------------------------------------------------------------------------------
.
You can see that PCA and factor PCF give identical eigenvalues, but very different loadings for the first two factors/components. Of course, the substantive "story" of the factors/components (in terms of positive/negative loadings)seem similar in both analyses.
Comment